Mono Regression Benchmarking
نویسندگان
چکیده
Regression benchmarking is a methodology for detecting performance changes in software by periodic benchmarking. Detecting performance regressions in particular helps to improve software quality, similarly as regression testing, which however focuses only on software functionality. To achieve an acceptable level of false alarms, regression benchmarking requires a statistically sound planning and evaluation of the benchmarks. This paper presents a statistical method for automated detection of performance changes and shows how this method performs on a prototype of a regression benchmarking suite for Mono, a large open source project which implements the .Net platform. Introduction In present, the role of testing in software development is increasingly gaining importance as opposed to analysis. This progress, caused by factors such as large and displaced developer teams and rising systems complexity, is also backed methodologically by Extreme Programming [Jeffries et al., 2000]. Testing is often automated and carried out regularly, so that software bugs can be detected soon after they are introduced, which makes their fixing easier. Current regression testing covers only software functionality, neglecting performance as an important software quality factor. Regression benchmarking [Bulej et al., 2005, 2004] fills this gap by detecting performance regressions using benchmarking (in this paper, regression means degradation or performance degradation, it is not related to regression in statistics). The additional issues that make regression benchmarking challenging compared to regression testing are automated running of benchmarks and automated analysis of benchmark results. The running of benchmarks is more complicated than the running of functionality tests because it has to be done non– intrusively, so that the results are not affected by unrelated system load. This is hard to achieve in face of active monitoring, which is a necessity for benchmarking development versions of software. A generic environment for fully automated running of benchmarks was proposed in [Kalibera et al., 2004]. The environment is currently under development. The Mono regression benchmarking suite uses its own less elaborate execution system, which is platform dependent and not distributed. This execution system however helps to fine-tune the requirements for the generic environment. The analysis of benchmark results is in principle more complicated than the analysis of functionality tests. The aim of the analysis is to reliably detect performance changes. With the current complexity of software and hardware, the benchmark results are often distorted by random effects. It is therefore necessary to assess precision of the benchmark results and take into account the precision when comparing results of two consecutive versions of the investigated software. The method for detecting performance regressions, described in this paper, defines the benchmark precision using the width of the confidence interval for the mean benchmark response time and bases the changes detection on non–overlapping of these confidence intervals. The confidence intervals are constructed considering random effects in memory allocation [Kalibera et al., 2005c], which are reflected by a hierarchical statistical model of the benchmark. It was shown that even uncontrollable changes like renaming an identifier can significantly influence software performance, as they can modify memory layout of the application and consequently alter the number of memory cache misses [Gu et al., 2004]. There is clearly a danger that an unknown effect similar to this one can be overlooked and results of any automated method for detecting performance changes could become unusable in practice. The Mono regression benchmarking suite was created as a test–bed for the performance changes detection method to prevent such a pessimistic scenario. The suite automatically conducts benchmarking of daily versions of Mono [Novell, Inc., 2005], an open source implementation of the .Net platform [ECMA, 2002]. Mono is a non–trivial software project, which includes an implementation of the C# compiler, class libraries, and the virtual machine with just in time compiler (JIT) support. The Mono regression benchmarking suite is fully automated, from downloading WDS'05 Proceedings of Contributed Papers, Part I, 19–24, 2005. ISBN 80-86732-59-2 © MATFYZPRESS
منابع مشابه
Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results
Benchmarking as a method of assessing software performance is known to suffer from random fluctuations that distort the observed performance. In this paper, we focus on the fluctuations caused by compilation. We show that the design of a benchmarking experiment must reflect the existence of the fluctuations if the performance observed during the experiment is to be representative of reality. We...
متن کاملAn Efficiency Measurement and Benchmarking Model Based on Tobit Regression, GANN-DEA and PSOGA
The purpose of this study is designing a model based on Tobit regression, DEA, Artificial Neural Network, Genetic Algorithm and Particle Swarm Optimization to evaluate the efficiency and also benchmarking the efficient and inefficient units. This model has three stages, and it uses the data envelopment analysis combined model with neural network, optimized by genetic algorithm, to evaluate the ...
متن کاملGeneric Environment for Full Automation of Benchmarking
Regression testing is an important part of software quality assurance. We work to extend regression testing to include regression benchmarking, which applies benchmarking to detect regressions in performance. Given the specific requirements of regression benchmarking, many contemporary benchmarks are not directly usable in regression benchmarking. To overcome this, we present a case for designi...
متن کاملRegression Benchmarking Environment
Regression benchmarking as a part of regression testing is an application of benchmarking that aims at an automatic detection of performance regressions during application development. While the automation is a crucial requirement for regression benchmarking, it is hard to meet this requirement with contemporary benchmarks that are usually not fully automated. The paper gives an analysis of the...
متن کاملQuality Assurance in Performance: Evaluating Mono Benchmark Results
Performance is an important aspect of software quality. To prevent performance degradation during software development, performance can be monitored and software modifications that damage performance can be reverted or optimized. Regression benchmarking provides means for an automated monitoring of performance, yielding a list of software modifications potentially associated with performance ch...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005